Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sampling API back to LlamaTokenDataArray; Add DRY and XTC Samplers #594

Merged
merged 19 commits into from
Dec 9, 2024

Conversation

nkoppel
Copy link
Contributor

@nkoppel nkoppel commented Dec 4, 2024

The changes in 0.1.84 removed the ability to use samplers with LlamaTokenDataArrays. These changes add the sampling methods of version 0.1.83 back to LlamaTokenDataArray, adds support for the XTC and DRY samplers, and allows users to apply a LlamaSampler to a LlamaTokenDataArray.

These changes enable users to mix custom samplers with llama.cpp samplers, and makes the LlamaTokenDataArray struct useful again.

Let me know if you'd like to see any changes!

Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I've got a couple of comments that don't require changes, but I would like your thoughts on before I merge.

@nkoppel
Copy link
Contributor Author

nkoppel commented Dec 7, 2024

I've gone ahead and overhauled the sampling API in a manner which resolves your third review comment, but which is very different from what came before. In my latest version, LlamaSamplers can be defined as either a chain or a single sampler as determined by LlamaSamplerParams, which is an enum struct that contains all parameters needed to construct a LlamaSampler. I used this to factor out all of the LlamaSampler::add_ and LlamaTokenDataArray::sample_ methods in favor of specifying which sampler to use using the enum. This deduplicates a lot of code and does not make the interface much more clunky than it was.

If you think that this API is too far from the raw API or complicates things too much, let me know.

@nkoppel nkoppel requested a review from MarcusDunn December 7, 2024 20:32
Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit of thinking to do around lifetimes before this gets merged.

If you want to remove the samplers that require lifetime logic, it would be nice to get a first iteration of the trivial samplers in, then we can add the more complex ones in a separate PR. There's a lot of great work here I'd hate to see not in main because we're battling encoding model lifetimes guarantees into the sampler.

llama-cpp-2/src/sampling.rs Outdated Show resolved Hide resolved
llama-cpp-2/src/sampling.rs Show resolved Hide resolved
@MarcusDunn
Copy link
Contributor

I don't mind the new API. I weakly preferred the "closer to direct calls" one.

@MarcusDunn
Copy link
Contributor

MarcusDunn commented Dec 8, 2024

I think a LlamaSampler struct with a collection of factory functions would be preferable to the params + enum approach. This would also allow different lifetime signatures for different samplers (have most return LlamaSampler<'static> if it owns everything, or LlamaSampler<'a> if it takes a reference to a model for example)

@nkoppel
Copy link
Contributor Author

nkoppel commented Dec 8, 2024

Okay, I've rewritten the API again, and I think that this is about as simple as it will get while still allowing LlamaSampler to be either a chain or a single sampler. It is also about as close to the raw api as it was before this pr. I will write documentation for the new methods on LlamaSampler later tody.

Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nitpicks. This looks good - address / dismiss as you sit fit and once you're happy I will merge long as tests pass.

I think this is also one of the parts of the library that can be reasonably tested (stuff that does inference or loads a model is too hard), a couple tests (preferably in docs) would be lovely.

Thanks for the great PR and patience with the back and forth.

@MarcusDunn MarcusDunn merged commit cf69db5 into utilityai:main Dec 9, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants